Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics

نویسندگان

Omid Mashayekhi

Hang Qu

Chinmayee Shah

Philip Levis

چکیده

Control planes of cloud frameworks trade off between scheduling granularity and performance. Centralized systems schedule at task granularity, but only schedule a few thousand tasks per second. Distributed systems schedule hundreds of thousands of tasks per second but changing the schedule is costly. We present execution templates, a control plane abstraction that can schedule hundreds of thousands of tasks per second while supporting fine-grained, per-task scheduling decisions. Execution templates leverage a program’s repetitive control flow to cache blocks of frequently-executed tasks. Executing a task in a template requires sending a single message. Large-scale scheduling changes install new templates, while small changes apply edits to existing templates. Evaluations of execution templates in Nimbus, a data analytics framework, find that they provide the fine-grained scheduling flexibility of centralized control planes while matching the strong scaling of distributed ones. Execution templates support complex, real-world applications, such as a fluid simulation with a triply nested loop and data dependent branches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable, Fast Cloud Computing with Execution Templates

Today, data analytics frameworks adopt one of two strategies to schedule their computations across workers. In the first strategy, systems such as Spark [3] use a centralized control plane, with a single node that dispatches small computations to worker nodes. Centralization allows a framework to quickly reschedule, respond to faults, and mitigate stragglers. However, the centralized controller...

متن کامل

ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data

As data continues to be generated at exponentially growing rates in heterogeneous formats, fast analytics to extract meaningful information is becoming increasingly important. Systems widely use in-memory caching as one of their primary techniques to speed up data analytics. However, caches in data analytics systems cannot rely on simple caching policies and a fixed data layout to achieve good ...

متن کامل

Neutrino: Revisiting Memory Caching for Iterative Data Analytics

In-memory analytics frameworks such as Apache Spark are rapidly gaining popularity as they provide order of magnitude performance speedup over disk-based systems for iterative workloads. For example, Spark uses the Resilient Distributed Dataset (RDD) abstraction to cache data in memory and iteratively compute on it in a distributed cluster. In this paper, we make the case that existing abtracti...

متن کامل

PerfEnforce: A Dynamic Scaling Engine for Analytics with Performance Guarantees

In this paper, we present PerfEnforce, a scaling engine designed to enable cloud providers to sell performance levels for data analytics cloud services. PerfEnforce scales a cluster of virtual machines (VMs) allocated to a user in a way that minimizes cost while probabilistically meeting the query runtime guarantees offered by a service level agreement (SLA). With PerfEnforce, we show how to sc...

متن کامل

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

MapReduce and Spark are two very popular open source cluster computing frameworks for large scale data analytics. These frameworks hide the complexity of task parallelism and fault-tolerance, by exposing a simple programming API to users. In this paper, we evaluate the major architectural components in MapReduce and Spark frameworks including: shuffle, execution model, and caching, by using a s...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics

نویسندگان

چکیده

منابع مشابه

Scalable, Fast Cloud Computing with Execution Templates

ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data

Neutrino: Revisiting Memory Caching for Iterative Data Analytics

PerfEnforce: A Dynamic Scaling Engine for Analytics with Performance Guarantees

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

عنوان ژورنال:

اشتراک گذاری